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Abstract 

Record is used to reduce the time and cost of running experiments (Doostparast 
and Balakrishnan, 2010). It is important to check the adequacy of models upon which 
inferences or actions are based (Lawless, 2003, Chapter 10, p. 465). In the area of 
goodness of fit based on record data, there are a few works. Smith (1988) proposed a 
form of residual for testing some parametric models. But in most cases, the variation 
inherent in graphical summaries is substantial, even when the data are generated by 
assumed model, and the eye can not always determine whether features in a plot are 
within the bounds of natural random variation. Consequently, formal hypothesis tests 
are an important part of model checking (Lawless, 2003). 

In this paper, Kolmogorov-Smirnov and Cramer-von Mises type goodness of fit tests 
for record data are proposed. Also a new weighted goodness of fit test is suggested. A 
Monte-Carlo simulation study is conducted to derive the percentiles of the statistics 
proposed. Finally, some real data sets are given to investigate results obtained. 

Key Words: Cramer-von Mises Statistics; Exponential model; Goodness of fit test; 
Kolmogorov-Smirnov statistics; Likelihood ratio test; Record data; Weibull model. 



1 Introduction 

In reliability, we are concerned primarily with test data in which lifetimes of items that 
fail during the course of the test are recorded or with variables related in some way to 
item lifetimes. If the actual lifetime of every item in the sample is recorded, the data are 
complete data. To obtain complete data, it is necessary to continue the experiment until 
the last item on test or in service has failed. In cases where even a few items in the sample 
may have very long lifetimes, experiment can go on for a very long period of time and, in 
fact, well beyond the point at which the results may no longer be of any interest or use. 

* E-mail addresses: doostparast@math.um.ac.ir 



1 



In such situations, it may be desirable to terminate the study prior to failure of all items 
under test. When observation is discontinued prior to all items having failed, we obtain 
the so-called censored data. There are a variety of forms of censored data that arise in 
practice; See, for example, Balakrishnan and Cohen (1991) and Cohen (1991). 

A form of censored data that is often encountered in applications is the so-called record 
data. As pointed out by Gulati and Padgett (1995), often, in industrial testing, meteo- 
rological data, and some other situations, measurements may be made sequentially and 
only values smaller (or larger) than all previous ones are recorded. Such data may be rep- 
resented by (r,k) := (ri, ki,r2, k2, ■ ■ ■ , r^, km), where rj is the i-th. record value meaning 
new minimum (or maximum) and ki is the number of trials following the observation of 
Tj that are needed to obtain a new record value (or to exhaust the available observation). 
There are two sampling schemes for generating such a record-breaking data: 

• {Inverse sampling scheme) Items are presented sequentially and sampling is termi- 
nated when the m-th minimum is observed. In this case, the total number of items 
sampled is a random number, and K^, is defined to be one for convenience; 

• {Random sampling scheme) A random sample Yi, - ■ ■ , is examined sequentially 
and successive minimum values are recorded. In this setting, we have TV^"), the 
number of records obtained, to be random and, given a value of m, we have in this 
case YhLi = n. 

A random variable X is said to have an exponential distribution, denoted hy X <^ 
Exp{a), if its cumulative distribution function (cdf) is 

F(a;;(7) = 1 -expj- (1^)1 , a; > 0, a > 0, (1) 

and the probability density function (pdf) is 

/(x;a) = ^exp{-(^)}, x > 0, a > 0. (2) 

The exponential distribution is commonly used in many applied problems. Such a expo- 
nential distribution is a natural model while studying a variable that can take on only 
positive values such as lifetime of units. In some situations, the Weibull distribution is 
more suitable than the exponential distributions (Nelson, 1985). The Weibull cdf, denoted 
by W{a, a), is 

F(x;q;,(7) = 1 - exp |- ^-j | , a > 0, a > 0, (3) 
and hence with pdf 

/(x;a,(7) = — ^expj- (^-j | , a > 0, a > 0. (4) 
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The scale parameter a is called the characteristic life because it is always 63. 2-th percentile. 
It determines the spread and has the same units as failure times, for example hours, 
months, cycles, and so forth. Parameter a is a unitlcss pure number and determines 
the shape of the distribution. For a = 1, the Weibull distribution is the exponential 
distribution. The WcibuU distribution appears very frequent in practical problems when 
we observe data representing minimal values. For example, the life of a capacitor is 
determined by shortest-lived portion of dielectric. For many parent populations with 
limited left tail, the limit of the minimum of independent samples converges to a Weibull 
distribution (Lawless, 2003). Researchers often like to make parametric assumptions on 
the underlying distribution. With this in mind, estimation of the mean of an exponential 
distribution based on record data has been treated by Samaniego and Whitaker (1986) 
and Doostparast (2009). Hoinkes and Padgett (1994) obtained the ML estimators from 
record-breaking data in this model. 

As pointed out by Lawless (2003, Chapter 10, p. 465), it is important to check the 
adequacy of models upon which inferences or actions are based. In the area of goodness 
of fit based on record data, there is a lack of published literature. But, there are a few 
works in this direction. However, informal methods of model checking emphasize graph- 
ical procedures such as probability and residual plots. Smith (1988) proposed a form of 
residual for testing some parametric models. But in most cases, the variation inherent in 
graphical summaries is substantial, even when the data are generated by assumed model, 
and the eye can not always determine whether features in a plot are within the bounds of 
natural random variation. Consequently, formal hypothesis tests are an important part of 
model checking. 

Motivated by this, the aim of this paper is to provide some methods for model checking 
on the basis of records. Specifically, suppose that the record data {Ri,Ki, ■ ■ ■ ,R^,K^} 
are coming from a population with parent cdf F{.). We consider testing 



where a and a may be unknown positive constants. In other word, is the weibull model 
adequate to fit the data? Therefore, the rest of this article is organized as follows. Since 
weibull model has a wide variety application, in Section 2, maximum likelihood estimate 
(MLE) of the unknown parameters in Weibull model are obtained. In Section 3, explicit 
expression for Kolmogorov-Smirnov (K-S) and Cramer-Misses (C-M) goodness of fit tests 
is derived and we proposed a new modified goodness of fit test which is more suitable than 
the K-S and C-M statistics for records. Critical values of these statistics arc obtained by 
a simulation study. In Section 4, Exponential model is considered and goodness of fit test 
for exponential model against the alternative weibull model is obtained. Finally, some 
numerical examples are given to investigate results obtained. 




V X G (0,-|-oo). 



(5) 
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2 Fitting a Weibull model 

It can be shown that, the hkehhood function for the two samphng schemes is given by 

m 

L{e) = J] fin) {1 - F{n)f^-^ , < < • • • < r2 < ri. (6) 



i=l 



Let us assume that the sequence {Ri,Ki, ■ ■ ■ , Rm, Km} are coming from W{a, cr)-model. 
The corresponding hkehhood function under either random or inversely samphng is ob- 
tained as 



^(^)-^|n-^| --V^--Y>^r?y (7) 
After taking logarithm, we have 

m ^ m 

1(9) = m log(a) - ma log(cr) + (a - 1) ^ log(ri) hrf . (8) 



1=1 1=1 



Through this paper "log" denotes natural logarithm. One can easily show that, the max- 
imum of ([8]) for m > 2, by taking derivatives, is obtained from solving the equations 



1/a 

(9) 



and 

^ m 

h{a) = -Y^lnn, (10) 



m 

i=l 



where 



Ha) = g;-^fr- - i. 



The equation (jlOp cannot be solved explicitly and hence the MLEs must be found by 
numerical methods. These equations is similar with equations (6.2) and (6.3) of Lehmann 
and Casella (1998, Ch. 6, p. 468). Hence, one can show that these equations have a 
unique solution. 



4 



3 GOF for weibull model 



GOF tests can be based on the approaches of comparison of parametric estimates with 
nonparametric counterparts. Two well known examples are the Kolmogorov-Smirnov (K- 
S) and the Cramer-von Mises (C-M) statistics defined by 

bn= sup |F(x) -Fo(x)|, (11) 
— oo<a;<+oo 

and 

\F{x)-Fq{x)\ dFoix), (12) 

-oo 

respectively, where Fq{x) is the hypothesized model while F{x) is the corresponding non- 
parametric maximum likelihood estimation (NPMLE). On the basis of record data, arising 
from a random sample with size n, Samaniego and Whitaker (1988) obtained NPMLE of 
survival function F{x) := 1 — F{x) as 

F{x)= n ' (13) 



T.T=i hi) 



where r(o) = and ^(i) < ^"{2) < • • • < are the observed record values, ordered from 
smallest to largest and {A;(j)} are the induced order statistics corresponding to the ordered 
record values {^(j)} or /c(j) = km-i+i, i = 1,2, ■ ■ ■ ,m. As mentioned by Samaniego and 
Whitaker (1988), NPMLE in ()13p will perform poorly when estimating the right tail of 
the actual distribution, thus we suggest a new GOF statistic as follows 

DSn = n (f{x) - Fo(x) ) ——dFo{x). (14) 

Jo ^ ' FqKx) 

The basic idea for DS^ is similar with Anderson-Darling statistic and is to measure the 
distance between F[x) and Fq(x) in left tail region of Fn{x) better than C-M statistic in 
([12]). One may notice that, on the basis of record data, the statistics Dn, and DSn 
are modified so that the supreme and integral are over the range y < ri. Sufficiently large 
values of D„, or DSn provide evidence against the hypothesized model. To calculate 
the test statistics, the following Proposition is helpful. 

Proposition 3.1 Let Ri,Ki,--- ,Rm,Km be record data arising from a random sample 
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with size n. Then the statistics Dn, and DSn are simplified as 

Dn = max |max|^i---$i_i-lb(?^w),-^o(r-{i))-$i---^i}}, 



l<i<n 
m+1 



w 

and 

DSr,. 



n 



i=i ^ 



(15) 
(16) 



m+l f 

.i=l ^ 
m+l 



+2 



(J)-,^ . . . $ 



i-l 



1 



2 = 1 



lnFo(r(i)) 



^>i---$^_i - 1 



lnFo(r(i_i)) \ , 



^>i---$^_i - 1 



,(17) 



respectively, where r(Q) = 0, x^+i = +c« and for i = 1, ^i - ■ ■ = 1 and 



Er=i hi) - 1 



\ < i < m. 



Proof Proof of (jl5|) is clear. For (|16|) . we have 



n 



n 







m+l „j. 



n 



/ {Fn(y)-Fo(y)} di^o(y) 



i=l •"^•(^-l) 

""Jl I {^1 ■ ■ ■ ^i-i - My)} dFoiy) 

i=i ■'n^-i) 

m+l ^Fo(r(,)) . ^ .2 

<^ <I>1 • • • <^j_i - 1 + n ^ du 



n 

i=l •^^o(r{,_i)) 
m+l 

i=l 



= {4'i---^2-i-l + ^o(r-(i))} -{4'i---|.,_i-l + Fo(r-(i-i))}^ 

i=l '- 

Similarly, one can show (jl7p and desired result follows. 



□ 



Proposition 3.2 Assuming Hq : -Fo(y) = 1 — exp {— (x/cr)"^} is iree. Conditionally on 
{A^(") > 2}, i/ie distribution of Dn, and DSn, on the basis of record data do not 
depend on Fo(y). 



Proof Suppose {iV(") > 2}. Let i?^ = (i?»/o-)°. Thus, R[,Ki,--- ,R'n,,Km are coming 
from a random sample with common distribution function 1^(1, 1). The ML estimates on 
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the basis oi R[, Ki, ■ ■ ■ , R'^,Km, denoted by a' and a', are obtained by solving Q and 
(jlOP replacing rj with r^. One can easily verify that a = aa' . This implies that 



a 




Hence, the estimate of weibull distribution function is obtained as 
Fo{x;a,a) = Fo{x;a,cr) 
= 1 — exp 

= 1 — exp 




(18) 

Similarly to Liao and Shimokawa (1999), this equation indicates that Fo{x;a,a) is inde- 
pendent of the "true values" of the parameters a and a. This implies that Dn, and 
DSn is not depend on the "true value" of a and a when the parameters are estimated by 
the MLEs. The desired result follows. □ 

Proposition 13.21 clarifies that the distribution of Dn, and DSn, on the basis of 
record data, can be calculated via simulation without loss of generality by using a weibull 
distribution with a = a = I. Let Dn^-y, W^^^ and DSn,'y denotes the 7-th quantile of the 
distribution of Dn, Wn and DSn, on the basis of record data, respectively. These tests 
rejects the null hypothesis Hq : F{x) = 1 — exp {— (x/fi)"} of size 7, if the used GOF 
statistic exceeds its corresponding (1 — 7)-th quantile. Table [1] presents simulated critical 
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Table 1: Percentiles of D„ 



and DSn 



for GOF of weibull model. 
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DS„ 

DS„ 
D„ 

DS„ 
D„ 

DS„ 



0.01 
0.1758 
0.0108 
0.3819 

0.1508 
0.0786 
0.9747 

0.0858 
0.7155 
2.7572 

0.0451 
3.4983 
10.9018 



0.025 
0.2008 
0.0166 
0.4546 

0.1646 
0.1354 

I. 0608 

0.1047 
0.9951 
3.1844 

0.0676 
3.6557 

II. 6437 



0.05 
0.2253 
0.0252 
0.4963 

0.1877 
0.2124 
1.1891 

0.1394 
1.1811 
3.5657 

0.1067 
3.7911 
12.3496 



0.1 
0.2584 
0.0414 
0.5499 

0.2372 
0.3524 
1.4090 

0.2109 
1.3369 
4.0448 

0.1863 
3.9573 
13.4346 



0.5 
0.4445 
0.2749 

I. 0480 

0.5296 
0.9664 
2.4504 

0.6430 
3.1507 
8.1438 

0.7763 

II. 4929 
38.7320 



0.90 
0.8093 
0.8842 
2.6889 

0.8854 
2.3140 
8.9519 

0.9322 
5.4022 
26.5699 

0.9663 
15.0385 
98.0011 



0.95 
0.8627 

I. 0706 
3.8012 

0.9170 
2.5707 

II. 5462 

0.9502 
5.7196 
31.9874 

0.9743 
15.4166 
110.7805 



0.975 
0.8846 
1.1545 
4.4511 

0.9361 
2.7348 
13.7890 

0.9611 
5.9185 
36.4863 

0.9797 
15.6718 
121.8708 



0.99 
0.8976 
1.2063 
4.9176 

0.9494 
2.8530 
15.8577 

0.9704 
6.0919 
41.5800 

0.9846 
15.9096 
135.2813 



values provided by a Monte-Carlo method. For this task, MC simulation provides the total 
sets of M = 100,000 record samples and the values of -D„, and DSn calculated 
and increasingly ordered. Then the critical values of D^, and DSn for some significant 
level were calculated. 

4 GOF for exponential model 

As mentioned earlier, the model VK(a, a) reduces to Exp{a) model when a = 1. Therefore, 
in this case, testing the hypothesis Hq : X ~ Exp{a) against the alternative Hi : X ~ 
W{a,a) is equivalent to testing Hq : a = 1 against the alternative Hi : a 1. We could 
not find a UMP test of size 7 (0 < 7 < 1) for this hypothesis testing problem. We leave it 
as an open problem. Therefore, we used the generalized likelihood ratio (GLR) procedure 
in order to test these hypotheses. From ([3]), (jl} and ([7]), likelihood ratio statistic for 
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testing Hq : a = 1 against the alternative Hi : a ^ 1 is given by 



A 



1 I 

■m 




a, 



m 







m 






'i'^i / 


m 


\ 












(19) 



where a is obtained by solving equation (jlOp and is the maximum likelihood estimation 
of a under li\ while is the ML estimate of a under and is given by X^I^i ^i^il"^- 

Proposition 4.1 When a is unknown, critical region of the GLR test of level j for testing 
Hq : a = 1 against the alternative Hi : a ^ 1 is given by 



C 



m 



.i=l 



a is the maximum likelihood estimation of a under Hi and C* is obtained from the size 
restriction 



m 



.1=1 



Under Hq, it can be shown that —2 In A has an asymptotic chi-square distribution with 
one degree of freedom when n, sample size, goes to infinity, thus C* k. exp { — ixi, 1-7)1 
where Xv,p is the p-th quantile of a chi-square distribution with v degrees of freedom. 
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Table 2: Times (in minutes) between 48 consecutive calls. 



1.34 


0.14 


0.33 


1.68 


1.86 


1.31 


0.83 


0.33 


2.20 


0.62 


3.20 


1.38 


0.96 


0.28 


0.44 


0.59 


0.25 


0.51 


1.61 


1.85 


0.47 


0.41 


1.46 


0.09 


2.18 


0.07 


0.02 


0.64 


0.28 


0.68 


1.07 


3.25 


0.59 


2.39 


0.27 


0.34 


2.18 


0.41 


1.08 


0.57 


0.35 


0.69 


0.25 


0.57 


1.90 


0.56 


0.09 


0.28 



Table 3: Record data arising from times (in minutes) between 48 consecutive calls. 
~i i 2 3 4 5 
Ri 1.34 0.14 0.09 0.07 0.02 
Ki 1 22 2 1 22 



5 Illustrative examples 
Example 1 

Table [2] shows the times between 48 (in minutes) consecutive telephone calls to a company's 
switchboard, as presented by Castillo et. al. (2005). Assuming that the times between 
the consecutive telephone calls follow the exponential distribution Exp{a), Castillo et. 
al. (2005) obtained the MLE of a based on the complete data as ac = 0.934. The 
corresponding record data, obtained from these complete data, are presented in Table El 
By assuming Exp{a)-niodel, the MLE of a on the basis of record data is obtained to be 
(5"o = 1.022 while by assuming W {a, a) -model, from ([9]) and (fTO]l . MLEs of a and a is 
obtained as d = 1.1815 and a = 0.8181, respectively. To calculate the GOF statistics. 
Table H] is useful. From Table HI we conclude that 

Dn = 0.6979, =, 5.5140 DSn = 8.8604 

Letting 7 = 0.05, from Table (H three approaches lead to accept Weibull model for this 
data. For testing exponential model against the alternative Weibull model, GLR statistics 



Table 4: GOF from times between 48 consecutive calls. 



i 


ri 


ki 








F„(r(i)) = Fo(r(i) 


)=exp{-(r(,)/<T)n 


1 


1.34 


1 


0.02 


22 


48-1 
48 


§ = 0.9792 


0.9876 


2 


0.14 


22 


0.07 


1 


26-1 
26 


i X i = 0.9415 


0.9467 


3 


0.09 


2 


0.09 


2 


25-1 
25 


ixixi = 0.9038 


0.9290 


4 


0.07 


1 


0.14 


22 


23-1 
23 


47 25 24x^-0 8646 

48 X 26 X 25 X 23 - U.6D4D 


0.8832 


5 


0.02 


22 


1.34 


1 


1-1 
1 





0.1667 
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Figure 1: Contour plot of likelihood function ([7]) on the basis of data in Table El 

Table 5: Successive minima, plane 7914. 
i 1 2 3 4 
Ri 50 44 22 3 
JCj 1 3 2 18 



is obtained as 

or, —2 In A = 0.3896630654 which gives the p — value = 0.5324766591. This supports 
exponential assumption by Castillo et. al. (2005). A graph of likelihood function is given 
in Figure [H 

Example 2 

Samaniego and Whitaker (1986) presented record data arising from successive failure times 
of air conditioning units in Boeing aircraft on plan 7914 consists of n = 24 failure times. 
The data is given in Table [5j They approximated these data by Exp{a)-model and es- 
timated the mean life a as &o = 70. Under T^(a, cr)-model, the MLEs of a and a are 
obtained as 

a = 1.598743046, a = 51.42746441, 
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20 70 120 170 



Sigma 



Figure 2: Contour plot of likelihood function ([7]) on the basis of data in Table [5] 

Table 6: Simulated record data from W{a = 4, a = 1). 
i 1 2 3 4 

Ri 0.879 0.765 0.735 0.220 
Ki 3 2 2 23 



respectively. Therefore, -21nA = 1.580279376 which gives the p- value = 0.2087204561. 
This supports exponential assumption by Samaniego and Whitaker (1986). A graph of 
likelihood function is given in Figure [2j 

Example 3 

Samaniego and Whitaker (1988) simulated a random sample with size n = 30 from W{a = 
4:,a = l)-model and record data arising from this sample is presented in TableEl Assuming 
Exp{a)-model, MLE of the mean life a is o"o = 2.67425000. By assuming VF(a, aj-model, 
the MLEs of a and a are obtained as 

a = 3.316071956, a = 0.9728468503, 
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0-r 




1 2 3 4 5 
Sigma 



Figure 3: Contour plot of likelihood function ([7]) on the basis of data in Table [6] 

respectively. Therefore, -21nA = 7.911804336 which gives the p- value = 0.0049113232. 
This supports departure from exponential assumption. A graph of likelihood function is 
given in Figure [3l 

6 Concluding Remarks 

In this paper, Kolmogorov-Smirnov and Cramer-von Misses type goodness of fit tests as 
well as a new weighted statistics for record data were proposed. These statistics were 
used to goodness of fit test for Weibull model. We suggest the following discipline to 
analyze record data: First step is to test weibull model using the proposed GOF tests in 
Section [3l Were it accepted, GLR test in Section H] for the exponentially model. Use the 
statistical procedures for record data arising from exponential model provided that the 
exponential model were accepted. See Samaniego and Whitaker (1986), Arnold et. al. 
(1998), Doostparast (2009), Doostparast and Balakrishnan (2010). If the exponentially 
was rejected, one can use the results of Hoinkes and Padgett (1994). If the weibull model 
was rejected, one can use the non-parametric results of Samaniego and Whitaker (1988). 

Following Samaniego and Whitaker (1988), one can consider the problem when the 
available data are arising from L sequence of random variables. More precisely, assume 
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that L independent samples 

yii,Yi2,--- ,yi,ni, i<i < L, 

each of size nj, are obtained sequentially from F. The resulting records are Rii,Kii, ■ ■ ■ , 
Rimi,Kimi for z = 1, 2, • • • , L where Kirm = rn - Ylfli^ ^ij- Similarly, the NPMLE of 
the survival function at point t is obtained as 

n T^'^r ' ^''^ 

where m* = J2i=i''T^i^ = 1)2, ••• ,m*} be the order observed record values in 

the L samples combined and = 1,2, •• • ,m*} the induced order statistics for the 

associated k^. To carry out the impact of L on the power of the GOF tests, one can 
conduct a simulation study. 
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